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Abstract — Recent breakthrough results in compressed sensing 
(CS) have established that many high dimensional objects can 
be accurately recovered from a relatively small number of non- 
adaptive linear projection observations, provided that the objects 
possess a sparse representation in some basis. Subsequent efforts 
have shown that the performance of CS can be improved by 
exploiting the structure in the location of the non-zero signal 
coefficients (structured sparsity) or using some form of online 
measurement focusing (adaptivity) in the sensing process. In this 
paper we examine a powerful hybrid of these two techniques. 
First, we describe a simple adaptive sensing procedure and show 
that it is a provably effective method for acquiring sparse signals 
that exhibit structured sparsity characterized by tree-based 
coefficient dependencies. Next, employing techniques from sparse 
hierarchical dictionary learning, we show that representations 
exhibiting the appropriate form of structured sparsity can be 
learned from collections of training data. The combination of 
these techniques results in an effective and efficient adaptive 
compressive acquisition procedure. 



I. Introduction 

Motivated in large part by breakthrough results in com- 
pressed sensing (CS), significant attention has been focused 
in recent years on the development and analysis of sampling 
and inference methods that make efficient use of measurement 
resources. The essential idea underlying many directions of 
research in this area is that signals of interest often possess a 
parsimonious representation in some basis or frame. For ex- 
ample, let x e C" be a (perhaps very high dimensional) vector 
which denotes our signal of interest. Suppose that for some 
fixed (known) matrix D whose columns are n-dimensional 
vectors dj, x may be expressed as a linear combination of the 
columns of D, as 

x = Y,®idi, (1) 

i 

where the cti are the coefficients corresponding to the relative 
weight of the contribution of each of the di in the representa- 
tion. The dictionary D may, for example, consist of all of the 
columns of an orthonormal matrix (eg., a discrete wavelet or 
Fourier transform matrix), though other representations may 
be possible (eg., D may be a frame). In any case, we define 
the support set S to be the set of indices corresponding to 
the nonzero values of a; in the representation of x. When |<S| 
is small relative to the ambient dimension n, we say that the 
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signal x is sparse in the dictionary D, and we call the vector a, 
whose entries are the coefficients on, the sparse representation 
of x in the dictionary D. 

The most general CS observation model prescribes collect- 
ing (noisy) linear measurements of x in the form of projections 
of x onto a set of m(< n) "test vectors" </>,;. Formally, these 
measurements can be expressed as 
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x + Wi 
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(2) 



where Wi denotes the additive measurement uncertainty asso- 
ciated with the ith measurement. In "classic" CS settings, the 
measurements are non-adaptive in nature, meaning that the 
{4>i} are specified independently of {yi} (eg., the test vectors 
can be specified before any measurements are obtained). Initial 
breakthrough results in CS establish that for certain choices 
of the test vectors, or equivalently the matrix $ whose rows 
are the test vectors, sparse vectors x can be exactly recovered 
(or accurately approximated) from m«n measurements. For 
example, if x has no more than k nonzero entries, and the 
entries of the test vectors/matrix are chosen as iid realizations 
of zero-mean random variables having sub-Gaussian distri- 
butions, then only m = 0(k\og(n/k)) measurements of the 
form (fJli suffice to exactly recover (if noise free) or accurately 
estimate (when Wi + 0) the unknown vector x, with high 
probability HI, 0. 

Several extensions to the traditional CS paradigm have been 
investigated recently in the literature. One such extension cor- 
responds to exploiting additional structure that may be present 
in the sparse representation of x, which can be quantified as 
follows. Suppose that a € W, the sparse representation of x 
in an n x p orthonormal dictionary D, has k nonzero entries. 
Then, there are generally (?) possible subspaces on which x 
could be supported, and the space of all A:-sparse vectors can 
be understood as a union of (k-dimensional) linear subspaces 
J3). Structured sparsity refers to sparse representations that 
are drawn from a restricted union of subspaces (where only 
a subset of the (?) subspaces are allowable). Recent works 
exploiting structured sparsity in CS reconsnuction include flU, 
(5) . One particular example of structured sparsity, which will 
be our primary focus here, is tree- sparsity . Let T p ,d denote 
a balanced rooted connected tree of degree d with p nodes. 
Suppose that the components of a sparse representation a e W 
can be put into a one-to-one correspondence with the nodes of 
the tree T v .d- We say that the vector a e W is k-tree-sparse in 
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the tree T p ,d if its nonzero components correspond to a rooted 
connected subtree of T Pt d- This type of tree structure arises, for 
example, in the wavelet coefficients of many natural images 

Another extension to the "classic" CS observation model is 
to allow additional flexibility in the measurement process in 
the form of feedback. Sequential adaptive sensing strategies 
are those for which subsequent test vectors {4>i}i>j may 
explicitly depend on (or be a function of) past measurements 
and test vectors {<f>i,yi}i<j- Adaptive CS procedures have been 
shown to provide an improved resilience to noise relative to 
traditional CS - see, for example, Q-JSl, as well as the 
summary article iflOl and the references therein. The essential 
idea of these sequential procedures is to gradually "steer" 
measurements towards the subspace in which signal x resides, 
in an effort to increase the signal to noise ratio (SNR) of each 
measurement. 

In this paper we examine a hybrid technique to exploit 
structured sparsity and adaptivity in the context of noisy com- 
pressed sensing. Adaptive sensing techniques that exploit the 
hierarchical tree-structured dependencies present in wavelet 
representations of images have been examined in the context 
of non-Fourier encoding in magnetic resonance imaging ifTTI . 
and more recently in the context of compressed sensing 
for imaging lfl2l . Our first contribution here is to quantify 
the performance of such procedures when measurements are 
corrupted by zero-mean additive white Gaussian measurement 
noise. Our main theoretical results establish sufficient con- 
ditions (in terms of the number of measurements required, 
and the minimum amplitude of the nonzero components) 
under which the support of tree-sparse vectors may be exactly 
recovered (with high probability) using these adaptive sensing 
techniques. Our results stand in stark contrast with existing 
results for support recovery for (generally unstructured) sparse 
vectors, highlighting the significant improvements that can be 
achieved by the intelligent exploitation of structure throughout 
the measurement process. 

Further, we demonstrate that tree-based adaptive com- 
pressed sensing strategies can be applied with representations 
learned from a collection of training data using recent tech- 
niques in hierarchical dictionary learning. This procedure of 
learning structured sparse representations gives rise to a power- 
ful general-purpose sensing and reconstruction method, which 
we refer to as Learning Adaptive Sensing Representations, 
or LASeR. We demonstrate the performance improvements 
that may be achieved via this approach, relative to other 
compressed sensing methods. 

The remainder of this paper is organized as follows. Sec- 
tion [II] provides a discussion of the top down adaptive com- 
pressed sensing procedure motivated by the approaches in 
IfTTI . fl2l . and contains our main theoretical results which 
quantify the performance of such approaches in noisy set- 
tings. In Section [III] we discuss the LASeR approach for 
extending this adaptive compressed sensing idea to general 
compressed sensing applications using recent techniques in 
dictionary learning. The performance of the LASeR procedure 
is evaluated in Section |IV] and conclusions and directions for 
future work are discussed in Section [V] Finally, a sketch of 



the proof of our main result is provided in Section [VT] 

II. Adaptive CS for Tree Sparse Signals 

Our analysis here pertains to a simple adaptive compressed 
sensing procedure for tree sparse signals, similar to the tech- 
niques proposed in IfTTI . 1121 . As above, let a eM. p denote the 
tree-sparse representation of an unknown signal x e W 1 in a 
known nxp dictionary D having orthonormal columns. We 
assume sequential measurements of the form specified in (f2j) 
where the additive noises Wi are taken to be iid A/"(0, 1). 

Rather than projecting onto randomly generated test vectors, 
here we will obtain measurements of x by projecting onto se- 
lectively chosen, scaled versions of columns of the dictionary 
D, as follows. Without loss of generality suppose that the 
index 1 corresponds to the root of the tree T p ,d- Begin by 
initializing a data structure (a stack or queue) with the index 
1, and collect a (noisy) measurement of the coefficient <x\ 
according to (f2]i by selecting <fii = j3di, where j3 > is a fixed 
scaling parameter. That is, obtain a measurement 

y = $d\x + w. (3) 

Note that our assumptions on the additive noise imply that 
y ~ Af(/3ai, 1). Now, perform a significance test to determine 
whether the amplitude of the measured value y exceeds a 
specified threshold r > 0. If the measurement is deemed 
significant (ie, \y\ > r), then add the locations of the d children 
of index 1 in the tree T p ,d to the stack (or queue). If the 
measurement is not deemed significant, then obtain the next 
index from the data structure (if the structure is nonempty) to 
determine which column of D should comprise the next test 
vector, and proceed as above. If the data structure is empty, the 
procedure stops. Notice that using a stack as the data structure 
results in depth-first traversal of the tree, while using a queue 
results in breadth-first traversal. The aforementioned algorithm 
is adaptive in the sense that the decision on which locations 
of a to measure depends on outcomes of the statistical tests 
corresponding to the previous measurements. 

The performance of this procedure is quantified by the 
following result, which comprises the main theoretical con- 
tribution of this work. A sketch of the proof of the theorem 
is given in Sec. IVT1 

Theorem 1: Let a be k-tree-sparse in the tree T Pt d with 
support set S, and suppose k < p/d. For any c\ > and 
C2 e (0, 1), there exists a constant C3 > such that if 



a min = mm |a;| > W c 3~^- (4) 

and t = C2pa m in, the following hold with probability at least 
l — k~ Cl : the total number of measurements obtained m = dk + 
1, and the support estimate S comprised of all the measured 
locations for which corresponding measured value exceeds r 
in amplitude is equal to S. 

A brief discussion is in order here to put the results 
of this theorem in context. Note that in practical settings, 
physical constraints (eg., power or time limitations) effectively 
impose a limit on the precision of the measurements that may 
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be obtained. This can be modeled by introducing a global 
constraint of the form 



(5) 



on the model (f2]i in order to limit the "sensing energy" that 
may be expended throughout entire measurement process. In 
the context of Thm. [TJ this corresponds to a constraint of the 
form £™i f3 2 < R. In this case for the choice 



R 



(6) 



(d+l)k 

Thm. [TJ guarantees exact support recovery with high proba- 
bility from O(k) measurements provided that a mm exceeds 
a constant times y/(d+l)(k/R) logfc. To assess the benefits 
of exploiting structure via adaptive sensing, it is illustrative to 
compare the result of Thm. [TJ with results obtained in several 
recent works that examined support recovery for unstructured 
sparse signals under a Gaussian noise model. The consistent 
theme identified in these works is that exact support recovery is 
impossible unless the minimum signal amplitude a m ; n exceeds 
a constant times yj (n/R) logn for non-adaptive measurement 
strategies |[T3l . lfT4l . or v/ (n/R) log fc for adaptive sensing 
strategies 1151 . Clearly, when the signal being acquired is 
sparse (fc << n), the procedure analyzed in this work succeeds 
in recovering much weaker signals. 

Our proof of Thm. QJ can be extended to obtain guarantees 
on the accuracy of an estimate obtained via a related adaptive 
sensing procedure. 

Corollary 1: There exists a two-stage (support recovery, 
then estimation) adaptive compressed sensing procedure for k- 
tree sparse signals that produces an estimate from m = O(k) 
measurements that (with high probability) satisfies 
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provided a min exceeds a constant times \J (k/R) log fc. 
By comparison, non-adaptive CS estimation techniques that 
do not assume any structure in the sparse representation can 
achieve estimation error 
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from m = 0(fclog(?i/fc)) measurements lfl6l . Exploiting 
structure in non-adaptive CS, as in results in an estimation 
procedure that achieves error 
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from m = O(k) measurements. Again, we see that the results 
of the corollary to Thm. [TJ provide a significant improvement 
over these existing error bounds, especially in the case when 
fc « n. 

III. Learning Adaptive Sensing Representations 

The approach outlined above can be applied in general 
settings, by employing techniques from dictionary learning 
ifTTl , fTSI . Let X denote an nxq matrix whose n-dimensional 
columns Xi comprise a collection of training data, and suppose 



we can find a factorization of X of the form X n DA, where 
D is an nxp dictionary with orthonormal columns, and A is a 
px q matrix whose columns a.j € R p each exhibit tree-sparsity 
in some tree T p ,d- The task of finding the dictionary D and 
associated coefficient matrix A with tree-sparse columns can 
be accomplished by solving an optimization of the form 



{D,A} = arg 
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subject to the constraint D T D 
term is given by 



■Dai\l + \£l(ai), (10) 



/. Here, the regularization 
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where Q is the set of p groups, each comprised of a node with 
all of its descendants in the tree T p ,d, the notation (<Xi) 9 refers 
to the subvector of en restricted to the indices in the set g s Q, 
the uj g are non-negative weights, and the norm can be either 
the £2 or £00 norm. Efficient software packages have been 
developed (eg., JT9l) for solving the optimizations of the form 
([Tol l via alternating minimization over D and A. Enforcing 
the additional constraint of orthogonality of the columns of D 
can be achieved in a straightforward manner. In the context 
of the procedure outlined in Sec. [Tj] we refer to solving 
this form of constrained structured dictionary learning task as 
Learning Adaptive Sensing Representations, or LASeR. The 
performance of LASeR is evaluated in the next section. 

IV. Experimental Results 

We performed experiments on the Psychological Image Col- 
lection at Stirling l20l which contains a set of 72 man-made 
and 91 natural images. The files are in JPG and TIFF format 
respectively, with each image of size 256 x 256 (here, each of 
the images was rescaled to 128 x 128 to reduce computational 
demands on the dictionary learning procedure). The training 
data were then each reshaped to a 16384 x 1 vector and 
stacked together to form the training matrix X e R 16384xl6 3 
After centering the training data by subtracting the column 
mean of the training matrix from each of the training vectors, 
we learned a balanced binary tree structured orthonormal 
dictionary with 7 levels (comprising 127 orthogonal dictionary 
elements). 

The LASeR sensing procedure was then applied with rows 
of dictionary scaled to meet the total sensing budget R for 
two test signals (chosen from the original training set). Since, 
during the dictionary learning process we specify the sparsity 
level of the signal in the learned dictionary, allocation of 
sensing energy to each measurement can be done beforehand 
(specifically f3 is defined as in (|6])). We evaluated the perfor- 
mance of the procedure for various values of r (the threshold 
for determining significance of a measured coefficient) in a 
noisy setting corrupted by zero-mean additive white Gaussian 
measurement noise. The reconstruction from the LASeR pro- 
cedure is obtained as the column mean plus a weighted sum 
of the atoms of the dictionary used to obtain the projections, 
where the weights are taken to be the actual observation 
values obtained by projecting onto the corresponding atom. 
When assessing the performance of the procedure in noisy 
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Fig. 1: Reconstruction SNR vs. Number of measurements plots (best viewed in color) with different sensing energy R and 
fixed noise level a 2 = 1 for different schemes (LASeR, PCA, direct wavelet sensing, model-based CS and Lasso). Results 
in each row corresponds to a different test image. Column 1: R = 128 x 128, Column 2: R = (128 x 128)/8, Column 3: 
R = (128 x 128)/32. Here, □ is PCA, o is model-based CS, is CS-Lasso, < and o are for direct wavelet sensing with r = 
and t = 0.5 respectively. Colored solid lines are for LASeR with red for r = 0, green for r = 0.04, blue for r = 0.06 and black 
for t = 0.1. 



settings, we averaged performance over a total of 500 trials 
corresponding to different realizations of the random noise. 

Reconstruction performance is quantified by the reconstruc- 
tion signal to noise ratio (SNR), given by 

SNR = 101 ° g -(^)- (12) 

where x and x are the original test and reconstructed signal 
respectively. 

To provide a performance comparison for LASeR, we also 
evaluate the reconstruction performance of the direct wavelet 
sensing algorithm described in lfl2l . as well as Principal com- 
ponent analysis (PCA) based reconstuction. For PCA, the re- 
construction is obtained by taking projections of the test signal 
onto the principal components and adding back the subtracted 
column mean to the reconstruction. We also compare with 
"traditional" compressed sensing and model-based compressed 
sensing J5), where measurements are obtained by projecting 
onto random vectors (in this case, vectors whose entries are 
i.i.d. zero-mean Gaussian distributed) and reconstruction is 
obtained via the Lasso and CoSaMP respectively. In order to 
make a fair comparison among all of the different strategies, 
we scale so that the constraint on the total sensing energy is 
met. 

Reconstruction SNR values vs. number of measurements 
for two of the test images is shown in Fig. Q] The results 
in the top row (for the first test image) show that for a 
range of threshold values r one can get a good reconstruction 



SNR by taking only 60 - 65 measurements using LASeR 
with very limited sensing budget R. On the other hand, 
reconstruction SNR for Lasso and model-based CS degrade as 
we decrease the sensing energy R. The results in the bottom 
row (corresponding to the second test image) demonstrate a 
case where the performance of LASeR is on par with PCA. In 
this case too, the SNR for Lasso and model-based CS decrease 
significantly as we decrease R. The advantage of LASeR is in 
the low measurement (high threshold) and low sensing budget 
scenario where we can get a good reconstruction from few 
measurements. 

V. Discussion/Conclusion 

In this paper, we presented a novel sensing and reconstruc- 
tion procedure called LASeR, which uses dictionaries learned 
from training data, in conjunction with adaptive sensing, to 
perform compressed sensing. Bounds on minimum feature 
strength in the presence of measurement noise were explicitly 
proven for LASeR. Simulations demonstrate that the pro- 
posed procedure can provide significant improvements over 
traditional compressed sensing (based on random projection 
measurements), as well as other established methods such as 
PCA. 

Future work in this direction will entail obtaining a complete 
characterization of the performance of the LASeR procedure 
for different dictionaries, and for different learned tree struc- 
tures (we restricted attention here to binary trees, though 
higher degrees can also be obtained via the same procedure. 
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VI. Proof of Main Result 

Before proceeding with the proof of the main theorem, 
we state an intermediate result concerning the number of 
measurements that are obtained via the procedure described 
in Sec. _T| when sensing a fc-tree-sparse vector. We state the 
result here as a lemma. The proof is by induction on k, and is 
straightforward, so we omit it here due to space constraints. 

Lemma 1: Let T P} d denote a completed rooted connected 
tree of degree d with p nodes, and let a e R p be k-tree-sparse 
in Tp.d with k < q/d. If the procedure described in Sec. 1771 is 
used to acquire a, and the outcome of the statistical test is 
correct at each step, then the procedure halts when m = dk + l 
measurements have been collected. 

In other words, suppose that for a fc-tree sparse a with 
support S, the outcomes of each of the statistical tests of the 
procedure described in Sec. _T| are correct. Then, the set of 
locations that are measured is of the form S u S c , where S 
and S c are disjoint, and |<S C | = (d - l)k + 1. 

A. Sketch of Proof of Theorem Q_ 

An error can occur in two different ways, corresponding to 
missing a truly significant signal component (a miss hit) and 
determining a component to be significant when it is not (a 
false alarm). Let yd correspond to the measurement obtained 
according to the noisy linear model (f2]i by projecting onto the 
column dj. A false alarm corresponds to the event \y^.\ > t 
for some j e S c . Since in this case, we have yd ~ A/"(0, 1), 
using a standard Gaussian tail bound for zero mean and unit 
variance random variables, the probability of false alarm can 
be upper bounded as 



Similarly, the condition fee ^ a,nto T ) 2 / 2 < k Cl /2 implies 



Pr(falsc alarm) < e 



-t/2 



(13) 



Likewise, for j e S, a miss hit corresponds to |y<jj | < r. Letting 

amin = limits \aj\, we have 



Pr(miss hit) < e 



-(/3a min -r) 2 /2 



(14) 



for r < /3a min . 

Now, the probability of exact support recovery corresponds 
to the probability of the event that each of the |<S| = k statistical 
tests corresponding to measurements of nonzero signal com- 
ponents is correct, as are each of the |<S C | = m-k = (d-l)k + l 
tests corresponding to measurements obtained at locations 
where the signal has a zero component. Thus, the probability 
of the failure event can be obtained via the union bound, as 

Pr(failure) < |<S C | Pr(false alarm) + |<S| Pr(miss hit) 



(15) 



Let t = a((3a m i n ), where a e (0, 1). If, for some c\ > 0, each 
of the terms in the bound above is less than fc~ Cl /2, then the 
overall failure probability is upper bounded by k~ Cl . 

Consider the first term on the right hand side of (1151) . the 
condition (m-fc)e~ T I 2 < k~ Cl /2 implies that (for m = dk+l), 



/2bg((d-l)fe + l) + 2cibgfc + 2bg2 
a min >\/ • *- 6 -* 



\ 



2(l + ci)logfc + 21og2 
/3 2 (l-a) 2 ■ 



(17) 



There exists a constant C3 (depending on d and a) such that 
when a m i n > v/cglogO)//? 2 , both <Q__l and <___> are satisfied. 
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